Speaker diarization using divide-and-conquer
نویسندگان
چکیده
Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HAC’s quadratic computational complexity with respect to the number of data samples inevitably limits its application in large-scale data sets. In this paper, we propose a divide-and-conquer (DAC) framework for speaker diarization. It recursively partitions the input speech stream into two sub-streams, performs diarization on them separately, and then combines the diarization results obtained from them using HAC. The results of experiments conducted on RT-02 and RT-03 broadcast news data show that the proposed framework is faster than the conventional segmentation and clustering-based approach while achieving comparable diarization accuracy. Moreover, the proposed framework obtains a higher speedup over the conventional approach on a larger test data set.
منابع مشابه
Free Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملSpeaker Diarization Using a priori Acoustic Information
Speaker diarization is usually performed in a blind manner without using a priori knowledge about the identity or acoustic characteristics of the participating speakers. In this paper we propose a novel framework for incorporating available a priori knowledge such as potential participating speakers, channels, background noise and gender, and integrating these knowledge sources into blind speak...
متن کاملUsing Weighted Oriented Optical Flow Histograms for Multimodal Speaker Diarization
Speaker diarization currently focuses on using audio features to partition an audio stream into speaker homogeneous speech regions, in other words to determine “who spoke when”. Recent speaker diarization corpora contains video recordings in addition to the commonly used audio. Thus, we investigated the benefits of incorporating video features, namely histograms of weighted oriented optical flo...
متن کاملUnsupervised Compensation of Intra-Session Intra-Speaker Variability for Speaker Diarization
This paper presents a novel framework for unsupervised compensation of intra-session intra-speaker variability in the context of speaker diarization. Audio files are parameterized by sequences of GMM-supervectors representing overlapping short segments of speech. Session-dependent intra-session intra-speaker variability is estimated in an unsupervised manner, and is compensated using the nuisan...
متن کاملUsing a GPU, Online Diarization = Offline Diarization
This article presents a low-latency, online speaker diarization system (“who is speaking now?”) based on the repeated execution of a GPU-optimized, highly efficient offline diarization system (“who spoke when”). The system fulfills all requirements of the diarization task, i.e., it does not require any a priori information about the input, including specific speaker models. In contrast to earli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009